Table of Contents

3.1.0 vs 3.0.4

  • entirely rewritten forward and backward substitution for better parallelism and use of GPUs (useful in case of multiple RHS)
  • new CUDA kernels for panel reduction which drastically improve performance
  • added a routine to compute ATA and benchmark for normal equations
  • multiple bugfixes and minor improvements

V 3.0.4 vs 3.0.3

  • improved cmake install scripts and procedure

V 3.0.3 vs 3.0.2

  • minor bugfixes: incoherent passing of arguments in blockaxpy and blockcopy tasks, in spmatmv and in FindAMD

V 3.0.2 vs 3.0.1

  • Raise an error in potrf if matrix is indefinite
  • better handling of transposition
  • various bug fixes

V 3.0.1 vs 3.0

  • Fix in the cmake files which resulted in faulty behavior of the install step in Windows systems

V 3.0 vs 2.0

  • Support for Nvidia GPUs through the StarPU runtime system.
  • Cholesky factorization for solving symmetric positive definite systems.
  • Dynamic, hierarchical partitioning for the QR factorization.
  • Switched to cmake for the build process.
  • switched to fstarpumod StarPU module instead of hand-made interfaces and wrappers.
  • Environment variables to set default values for all control parameters.

V 2.0

Version 2.0 is an almost complete rewrite of the qrmumps package. Here are some of the main changes wrt previous versions

  • Parallelism is now achieved using the StarPU runtime engine.
  • 2D block partitioning can be used for frontal matrices in combination with communication avoiding dense factorization algorithms
  • it is possible to bound the memory consumption of the parallel factorization phase
  • pipelining of operation can be achieved through the asynchronous API
  • the error handling has been deeply modified to make it thread-safe

V 1.2

  • Added a method to extract the R factor once the factorization is computed.

V 1.1

  • There is no limit to the number of concurrent instances of qrmspmatc in the C interface
  • A number of minor bugfixes